Compiler-assisted Hybrid Operand Communication

نویسندگان

  • Dong Li
  • Behnam Robatmili
  • Madhu Saravana Sibi Govindan
  • Aaron Smith
  • Steve Keckler
  • Doug Burger
چکیده

Communication of operands among in-flight instructions can be power intensive, especially in superscalar processors where all result tags are broadcast to a small number of consumers through a multi-entry CAM. Token-based point-to-point communication of operands in dataflow architectures is highly efficient when each produced token has only one consumer, but inefficient when there are many consumers due to the construction of software fanout trees. Placing operands in registers is efficient for broadcasting the values which have consumers spread over a long lifetime, but inefficient for shorter-lived operations. This paper evaluates a compilerassisted hybrid instruction communication model that combine tokens instruction communication with statically assigned broadcast tags. Each fixed-size block of code is given a small number of architectural broadcast identifiers, which the compiler can assign to producers that have many consumers. Producers with few consumers rely on point-to-point communication through tokens. Producers whose result is live past the instruction block communicate with distant consumers through a register. Selecting the mechanism statically by the compiler relieves the hardware from categorizing instructions at runtime. At the same time, a compiler can categorize instructions better than dynamic selection does because the compiler analyzes a larger range of instructions. Furthermore, compiler could perform complex optimizations without hardware cost and execution-time penalty. We propose a compiler optimization to reuse broadcast tags for instructions with non-overlapping broadcast live ranges, the speedup is further improved without spending more power . The results show that this compiler-assisted hybrid token/broadcast model requires only eight architectural broadcasts per block, enabling highly efficient CAMs. This hybrid model reduces instruction communication energy by 28% compared to a strictly token-based dataflow model (and by over 2.7X compared to a hybrid model without compiler support), while simultaneously increasing performance by 8% on average across the SPECINT and EEMBC benchmarks, running as single threads on 16 composed, dual-issue EDGE cores.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiler-assisted multiple instruction rollback recovery using a read buffer - Computers, IEEE Transactions on

Abstrucf-Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardwarebased MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs have also been developed which remove rollback data hazards directly with data-flow tran...

متن کامل

Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer

Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compilerbased MIR designshave also been developed which remove rollbackdata hazards directlywith data-flowtransformations. ...

متن کامل

Capacity Enhancement in Hybrid Wireless Relay Network with Network Coding

Network coding technique increases wireless network communication efficiency. Wireless multihop relay network has been shown to achieve capacity gain over conventional single-hop wireless networks. Hybrid wireless relay networks integrate multihop ad hoc relay and infrastructure base stations to achieve better wireless network performance. Applying the promising network coding technique to hybr...

متن کامل

Compiler assisted Data Forwarding in VLIW/EPIC architectures

This paper proposes a mechanism for reducing the complexity of forwarding hardware in VLIW/EPIC processors. The necessary information for data forwarding is known at compile time. This paper proposes a way to incorporate the forwarding information along with the instruction itself, thereby reducing the hardware complexity of forwarding logic with implications for power saving and reducing chip ...

متن کامل

Integrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors

This paper considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency caused by interprocessor communication in cache coherent, shared memory multiprocessors. Data prefetching is accomplished by using a multiprocessor software pipelined algorithm. Data forwarding is used to target interprocessor data communication, rather than synchronizatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009